Goto

Collaborating Authors

 oracle estimator




E-Scores for (In)Correctness Assessment of Generative Model Outputs

Dhillon, Guneet S., González, Javier, Pandeva, Teodora, Curth, Alicia

arXiv.org Machine Learning

While generative models, especially large language models (LLMs), are ubiquitous in today's world, principled mechanisms to assess their (in)correctness are limited. Using the conformal prediction framework, previous works construct sets of LLM responses where the probability of including an incorrect response, or error, is capped at a desired user-defined tolerance level. However, since these methods are based on p-values, they are susceptible to p-hacking, i.e., choosing the tolerance level post-hoc can invalidate the guarantees. We therefore leverage e-values to complement generative model outputs with e-scores as a measure of incorrectness. In addition to achieving the same statistical guarantees as before, e-scores provide users flexibility in adaptively choosing tolerance levels after observing the e-scores themselves, by upper bounding a post-hoc notion of error called size distortion. We experimentally demonstrate their efficacy in assessing LLM outputs for different correctness types: mathematical factuality and property constraints satisfaction.



Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing Systems

First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. This paper studies the estimation of the k-dimensional principal subspace of a population matrix based on sample covariance matrix. Two estimators based on convex and non-convex optimizations are developed for projection matrix with large or small magnitude entries, respectively. Both these two estimators are shown to enjoy satisfactory theoretical properties and experimental results compared with state-of-the-art estimators. It would be better to clearly explain what the oracle knowledge used in the proposed algorithm is, and how to set up the oracle estimator comparison experiments.



Review for NeurIPS paper: A convex optimization formulation for multivariate regression

Neural Information Processing Systems

Weaknesses: The major weaknesses of the paper are listed below: 1. There are some potential inaccuracies in the description of the algorithm. For example, in Section 3.1, the first equalities in the two lines of equations after line 210 should be \approx instead, right? And does the notation p_{\tau_B} ' denote the sub-gradient of p_{\tau_B}? In general, some more explanations about the linearization here would be helpful.


A Statistical Theory of Regularization-Based Continual Learning

Zhao, Xuyang, Wang, Huiyuan, Huang, Weiran, Lin, Wei

arXiv.org Machine Learning

We provide a statistical analysis of regularization-based continual learning on a sequence of linear regression tasks, with emphasis on how different regularization terms affect the model performance. We first derive the convergence rate for the oracle estimator obtained as if all data were available simultaneously. Next, we consider a family of generalized $\ell_2$-regularization algorithms indexed by matrix-valued hyperparameters, which includes the minimum norm estimator and continual ridge regression as special cases. As more tasks are introduced, we derive an iterative update formula for the estimation error of generalized $\ell_2$-regularized estimators, from which we determine the hyperparameters resulting in the optimal algorithm. Interestingly, the choice of hyperparameters can effectively balance the trade-off between forward and backward knowledge transfer and adjust for data heterogeneity. Moreover, the estimation error of the optimal algorithm is derived explicitly, which is of the same order as that of the oracle estimator. In contrast, our lower bounds for the minimum norm estimator and continual ridge regression show their suboptimality. A byproduct of our theoretical analysis is the equivalence between early stopping and generalized $\ell_2$-regularization in continual learning, which may be of independent interest. Finally, we conduct experiments to complement our theory.


Analysis of a multi-target linear shrinkage covariance estimator

Oriol, Benoit

arXiv.org Machine Learning

Multi-target linear shrinkage is an extension of the standard single-target linear shrinkage for covariance estimation. We combine several constant matrices - the targets - with the sample covariance matrix. We derive the oracle and a \textit{bona fide} multi-target linear shrinkage estimator with exact and empirical mean. In both settings, we proved its convergence towards the oracle under Kolmogorov asymptotics. Finally, we show empirically that it outperforms other standard estimators in various situations.


AdaTrans: Feature-wise and Sample-wise Adaptive Transfer Learning for High-dimensional Regression

He, Zelin, Sun, Ying, Liu, Jingyuan, Li, Runze

arXiv.org Machine Learning

We consider the transfer learning problem in the high dimensional setting, where the feature dimension is larger than the sample size. To learn transferable information, which may vary across features or the source samples, we propose an adaptive transfer learning method that can detect and aggregate the feature-wise (F-AdaTrans) or sample-wise (S-AdaTrans) transferable structures. We achieve this by employing a novel fused-penalty, coupled with weights that can adapt according to the transferable structure. To choose the weight, we propose a theoretically informed, data-driven procedure, enabling F-AdaTrans to selectively fuse the transferable signals with the target while filtering out non-transferable signals, and S-AdaTrans to obtain the optimal combination of information transferred from each source sample. The non-asymptotic rates are established, which recover existing near-minimax optimal rates in special cases. The effectiveness of the proposed method is validated using both synthetic and real data.